https://app.emaze.com/@ACQLOZFI#1

https://app.emaze.com/@ACQLOZFI#1

Introduction

The sound of music has changed as society has changed over the years, thus the change in the sound of music is really a reflection of our cultural evolution. Modern society has been mixed up by diverse cultures, unlike old days. This is reflected in the changing sound of the music. Then, what about the lyrics? Have words used in lyrics also have changed much over time? Since things have become more diverse, it feels like lyrics might have become more complicated. Let’s find it out by analyzing the lyrics data.

1. Word Count: How number of words have changed over years

First, we will see whether number of words in lyrics has changed over time. If the number of words has increased over time, it would be reasonable to say that the songs actually have become more complicated.

1-1. Number of words in lyrics over decades

We can see that the qauntiles of number of words have not changed much. However, the number of outliers increases as time flows. This can be just because there are more data in 2000s and 2010s. We need more analysis to find out the reason.

1-2. Percentage of number of Genre over decades

As we can see from the above plot, “Rock”, “Pop”, and “Jazz” were three dominant genre in 1970s and 1980s. But percentage of “Hip-Hop” begins to increase in 1990; so does other genres, and percentage of “Jazz” and “Rock” begin to decrease. Usually, “Hip-Hop” has more words in its lyrics than other music genre. Increase in percentage in “Hip-Hop” genre over time may be a reason for the above outliers.

1-3. Number of words in lyrics per Genre

This is a Dot plot showing number of each songs grouped by genre and time lines. As we expected, most of the songs that use more number of words are Hip-Hop and modern Rock music.

Songs that use more number of words clearly appeared more in recent years. However, as we saw in the above boxplot, quantiles haven’t changed much, and it means number of words has not changed generally. So it’s hard to say lyrics of modern songs have become more complicated based on this single analysis. We need to do more.

2. Topic Modeling : What words are being used more often

Now let’s see whether popular words for lyrics have changed over time. From now on, I will divide the time line into two pieces, ‘Before 2000’ and ‘After 2000’, representing old time and modern time respectively.

2-1. Word Cloud, Before 2000


When we look at the word cloud before 2000, the most frequently used word is “love”. And most of the words look very positive.

2-2. Word Cloud, After 2000


When we look at the word cloud after 2000, the most frequently used word is ALSO “love”. But different thing from previous word cloud is that we can see more negative words, like “die”, “tear”, “wrong”, “die”, and even some swearing. And the word, “world” is being used much more often than before.

2-3. Top 10 Most Frequent Words

Top 10 Most Frequent Word. Before 2000s
word Freq
9837 love 12126
17036 time 5628
1124 baby 5224
19227 youre 4895
8277 ill 3694
4198 day 3543
11279 night 3497
7624 heart 3077
9563 life 3015
8672 ive 3009
Top 10 Most Frequent Word. Aefore 2000s
word Freq
53162 love 181926
91242 time 107374
103530 youre 95775
6177 baby 72401
44463 ill 68221
21983 day 62033
51667 life 59782
40701 heart 55775
46470 ive 54275
61429 night 51518

However, top 10 popular words are almost the same. Perhaps, it’s because most songs are about love; both past and present.
From above analysis, we could find out that there has been a mere few changes in popular words and negative words began to show up more in songs after year 2000.

3. Readability: Harder to read?

In this part, I wanted to know if the readability of lyrics has gotten worse over time. For a readability measure, I used “Bormuth.MC”, “Coleman”, “Dale.Chall”, and “Flesch”. Higher a readability score, easier to read in all measures.

3-1. Readability over time


(Darker part represents the higher score) From above heatmap, we can clearly see the readability score of songs are decreasing over time. Which means, it’s getting harder to read the lyrics. (Only “Coleman” measure shows different result. It’s probably because they are using different formula.) The readability scores of lyrics written in 1970s are especially high. Songs in the old days were easier to read, and probably therefore easier to understand while listening. Then what would be the reason for this decline in readability score over time? To find out, I analyzed the readability score for each genre.

3-2. Readability of each Genre


When we did the analysis about percentage of genre over time, we could see the biggest changes in percentage of “Jazz”, “Rock”, and “Hip-Hop” music. “Jazz” and “Rock” show a moderate readability score. Hence, it may not have affected much on readability score over time. However, “Hip-Hop” shows a little low readability score. Hence, increase in “Hip-Hop” music might have caused the decrease in readability score of lyrics of modern songs. So far, it seems quite rational to say songs made in recent years actually use more complicated lyrics; more number of words and harder to read.

4. Sentimental Analysis: Positive vs Negative

From above Topic analysis, we could see more negative words in the word cloud of lyrics written after year 2000. But, we do not have any numerical value about it. So I did sentimental analysis to get it. This sentimental analysis scores how much the word is positive or negative. By getting an average value for the words in the same time list, we would be able to compare two part of time lists; before 2000 and after 2000.

4.1 Sentimental Analysis: Before 2000

Ave. Sentiment for Each Genre. Before 2000
genre mean(sentiment)
Country -2.76923076923077
Electronic -0.857142857142857
Folk -1.48648648648649
Hip-Hop -9.38721804511278
Jazz 2.90704225352113
Metal -9.08943089430894
Other 4.04347826086957
Pop 1.48042704626335
R&B 0.996753246753247
Rock -2.80901922129128
Mean -1.69708274661658

In the Sentimental Analysis of lyrics written before year 2000, “Hip-Hop” and “Metal” are two genres that show the most negative sentiment, and “Other” and “Jazz” are two genres that show the most positive sentiment. And other genres show sentiment score between (-3, 3). And total average of the sentiment score is -1.70, which is little negative.

4.2 Sentimental Analysis: After 2000

Ave. Sentiment for Each Genre. After 2000
genre mean(sentiment)
Country -0.588978345363687
Electronic -2.01710376282782
Folk -2.86582809224319
Hip-Hop -13.5943999070524
Indie -2.76325903151422
Jazz 1.76883883078573
Metal -9.80942455822383
Other -5.40322580645161
Pop -0.382893549842129
R&B -0.423035522066738
Rock -3.37077680234105
Mean -3.58637150428554

In the Sentimental Analysis of lyrics written after year 2000, “Hip-Hop” and “Metal” are two genres that show the most negative sentiment, and “Jazz” is the only genre that shows the positive sentiment. Overall, all genres show very negative sentiment. Even “Other”, which gave one of the highest sentiment score in previous analysis, also shows very negative sentiment. Ofcourse, total average of the sentiment score got much lower, -3.59. Therefore, with this analysis, we could get more clear evidence that the lyrics written after year 2000 use more negative words. I think it definitely can make some group of people feel harder to understand while listening.

5. Summary

After doing Exploratory Data Analysis with lyrics data, I could get following results.

Hence, the lyrics of the songs are actually getting complicated and harder to understand, especially certain group of people may feel even harder.